The Global Water Access Gap

Introduction

Data

Add Tab Name (Alastair)

tab within tab if needed

tab within tab if needed

Add Tab Name (Siyi)

tab within tab if needed

tab within tab if needed

Add Tab Name (Jamie)

tab within tab if needed

tab within tab if needed

Advocacies about water access around the world (Masahiro)

Introduction

In this tab, we take a look at the the tweets advocating for greater access to water around the world in order to discover some interesting trends among those tweets. Specifically, in order to gather up the tweets, search_tweets() function was run on May 4th and May 7th, and the tweets generated roughly from April 27th to May 7th were recorded in the same dataset. The included tweets all include at least one of the following phrases: “water access,” or “Water access,” or “Water Access,” or “access to clean water,” or “access to drinking water.” For more details, take a look at the “Wrangling - Masahiro” file in the same repo. Through exploring the following three questions with data, we aim to learn about what kind of rhetoric people are employing in an attempt to claim for more access to water around the world.

  • What are the common words used in the tweets requesting more access to clean water around the world?
  • What are the common sentiments of the words observed in those tweets?
  • What do those common words and sentiments imply about people’s rhetorics arguing for clean water in some of the regions lacking water access?

In addition to the removal of so-called stop words from the dataset, we also omitted the word “access” because it is obvious that all the tweets should include the word “access” from the way we collected data. Doing so helps us produce more meaningful word clouds and sentiment analyses.

Word Cloud

First, we examine the word cloud addressing all the words except the ones displaced through data wrangling in order to get a sense about what are some of the most common words utilized in the focal tweets.

In the above word cloud, “https” stands out in its size, which implies that a lot of tweets related to water access advocacy refer to or cite other web resources. Also, “clean” is displayed largely in the visualization, which should be partly because “access to clean water” is one of the phrases we actively searched for when scraping tweets. However, given that we also looked for “access to drinking water” when gathering text while the word “drinking” does not have equally big size in the display as “clean,” it seems like that the word “clean” possesses a particularly great importance for arguments for greater water access across the earth. Paying attention to other words displayed with smaller sizes, it can be seen that the cloud includes a lot of words related to potential use of water or implication of access to water: “sanitation,” “healthcare,” “health,” “food,” and “hygiene.” Besides, one of the interesting words to be observed in the cloud is “india,” whose presence may be attributable to the socioeconomic standing of India as a country or the nation’s especially large population. Finally, we also found it intriguing that “covid” occupied its place in the above visialuzation because it suggested that tweets about water access were often associated with this pandemic, although there did not seem to be a lot of explicit or obvious connections between the infectious disease and water access.

Sentiment Analysis

Next, we dive into the sentiments reflected in the usage of English by those advocating for water access on twitter. We use the NRC lexicon for attaching sentimental implications for words observed in tweets, and visualize the common sentiment in the tweets with the following graph.

As can be seen, positive, trust, and joy are the most popular sentiments among the words included in the tweets. Negative follows those top three sentiments, and then, the least popular sentiments such as anger, anticipation, fear, and sadness occupy the subsequent places. With this bar chart, we verify that a lot of words employed in the analyzed tweets have some positive connotations, which not only refers to “positive” as a sentiment but also “trust” and “joy.” In order to learn more about the use of words detected as implying these sentiments, we have decided to utilize the comparison cloud (see the next tab).

Comparison Cloud

The below comparison cloud displays what words are commonly used in the text scraped from twitter while also having implications of “positive,” “trust,” or “joy.” Before diving into the detailed observations about the visualization itself, we lay out how the code below works. A comparison cloud enables users to accomplish two goals simultaneously: comparing the relative frequency of the use of certain words and classifying the most commonly used words into several categories based upon certain criteria. In order to craft a comparison cloud, however, it is necessary to transform the data into the form of matrix, whose column corresponds to certain categories (in this case, the sentiment) and whose row refers to each word by its name. In order to craft such a matrix, a lot of wrangling has been conducted to create a dataset whose row corresponds to words and column to each sentiment. If interested, analyze the commented code below.

# preliminary wranglings below
# first extract words with the connotations of interest
# tweet_sentiment = dataset used for sentiment analysis
pure_words <- tweets_sentiment %>%
  filter(sentiment == "positive" | sentiment == "trust" |
           sentiment == "joy") %>%
  # then collapse the rows so that each word only occupies a single row
  group_by(word) %>%
  summarize()
# now prepare the dataset to be joined with the dataset about the count of
# each word with the three focal sentiments
pure_words_copied <- pure_words %>%
  # let each word occupy three rows at the same time
  slice(rep(1:n(), each = 3)) %>%
  mutate(number = row_number()) %>%
  # list up all the sentiments of interest
  mutate(sentiment = case_when(number %% 3 == 1 ~ "positive",
                               number %% 3 == 2 ~ "trust",
                               number %% 3 == 0 ~ "joy")) %>%
  select(word, sentiment)
# the below dataset is about the count of each word with the three connotations
# of interest
comparison_words_prep <- tweets_sentiment %>%
  # extract those with the three sentiments of innterest
  filter(sentiment == "positive" | sentiment == "trust" |
           sentiment == "joy") %>%
  # and count the frequency
  group_by(word, sentiment) %>%
  summarize(N = n())
comparison_words_prep_2 <- pure_words_copied %>%
  # join the dataset with the data about the count (used for the bar)
  left_join(comparison_words_prep, by = c("word", "sentiment")) %>%
  # if some words do not imply certain sentiments, it will be reflected as 
  # N/A values, so turn it into 0
  mutate(count = case_when(is.na(N) ~ 0,
                           TRUE ~ as.numeric(N))) %>%
  select(word, sentiment, count)
# one last step to make each column refer to each sentiment
comparison_words_prep_3 <- comparison_words_prep_2 %>%
  spread(key = sentiment, value = count)
# the below code translates the data frame into a matrix, and each row name of
# the matrix should correspond to the word
comparison_words <- comparison_words_prep_3 %>%
  select(-word) %>%
  as.matrix()
rownames(comparison_words) <- comparison_words_prep_3$word

# create the comparison cloud
colors1 <- c("#48F11F", "#1226D2", "#CB0A3E")
colors2 <- c("#CCFF99", "#7F88EF", "#EF7FCA")
comparison.cloud(comparison_words, max.words = 100,
                 random.order = FALSE,
                 colors = colors1,
                 title.colors = colors1,
                 title.bg.colors = colors2)

As was the case in the first word cloud analysis, in this comparison cloud, too, “clean” stands out in its frequency of use as shown by its large size in the cloud. However, as a category, words classified into positive have more presence in the analyzed tweets as shown by the previous tab of bar chart, which means that the frequency of use of “clean” is not so big that it can dominate the text analysis conducted here by its extraordinarily large presence. Taking a closer look at the visualization above, we have noticed that the above display includes a lot of words related to potential outcomes caused by the greater access to water around the world: “food,” “healthy,” “save,” “green,” “income,” “medical,” “safe,” “luxury,” and “survive.” This finding somewhat resonates the insights gained in the original word cloud because both of the visualizations exhibit a lot of words associated with various promising implications of the access to water. Also, the above comparison cloud has let us notice that the tweets of interest contain a number of words related to the process of ensuring water access to underprivileged people: “advocate,” “guarantee,” “partnership,” “supporting,” “improving,” “conservation,” and “providing.” This suggests that the description of the necessary steps to secure water access around the world has made the tweets advocating for water access include a lot of words related to positive connotations, such as positiveness, trust, or joy.

Discussion

Throughout the exploration of the general word cloud, a bar chart, and a comparison cloud, this research has revealed that the tweets requesting greater access to water across the world incorporated a lot of words which connoted positiveness, joy, and trust, and that they specifically include a lot of words related to the potential outcomes of of access to water, such as “sanitation” or “food.” We believe that this may plausibly be attributable to the fact that a lot of tweets of interest here describe and discuss how securing water access can improve the life of people in developing country or what such water access enables. This explanation sounds convincing to some extent given that the comparison cloud has shown many words which can be associated with the process of improving water access, such as “partnership” or “donation.”

In other words, this study has revealed that the tweets arguing for water access around the world do not engage with negative words, such as death or disease, as much as they do with words with positive sentiments: “positive,” “joy,” “trust.” This implies that the tweets for advocacy of water access may talk more about how greater water access can resolve problems in the world by, for example, improving the sanitation, food access, and safety in some areas, rather than about how lack of water causes diseases, deaths, conflicts, or other sufferings on the earth. We find this speculation fairly plausible given all the results above, and also we find it intriguing that people describe more of the positive aspects of securing clean water around the world and less of the negative consequences caused by lack of water in discussing water access around the world.

However, we also acknowledge that these findings generated with word clouds do have limitations. The word cloud, bar chart, and comparison cloud here are all generated after cutting the tweets into words. In other words, we are not really analyzing the sentences, which is to say that we are not strictly distinguishing between the two following phrases: “today’s effort for greater water access can improve sanitation around the world,” and “today’s effort for greater water access does not improve sanitation around the world.” The two phrases include almost identical set of words, and moreover, since the negative connotation of the latter text is almost entirely due to the word “not,” which would have been removed as a stop word at the beginning of the data wrangling, our data analysis is not capable of distinguishing the sentiments between the two above phrases. Our findings indeed raise some common words among the tweets of interest, point to positive, joy, and trust as common sentiments, and reach potential explanations about people’s rhetoric which also resonate with what visualizations exhibit here. In short, as a blog project, we are confident that the text analysis using tweets have given substantial new perspectives upon people’s discourses for greater water access around the world. However, we also believe that we definitely need to exploit more techniques of text analysis to generate more accurate and meaningful findings, and future research may not only analyze these tweets as a set of words but also see them as a collection of bigrams or larger unit of English words in order to build upon and expand the discovery here.

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For example, you can include Bold and Italic and Code text. For more details on using R Markdown see http://rmarkdown.rstudio.com.

You should test out updating your GitHub Pages website:

  • clone your group’s blog project repo in RStudio
  • update “Your Project Title Here” to a new title in the YAML header
  • knit index.Rmd
  • commit and push BOTH the index.Rmd and the index.html files
  • go to https://stat231-s21.github.io/Blog-JAMS/ to see the published test document (this is publicly available!)

Including code and plots

You can embed code as normal, for example:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Let’s clean up the format of that output:

Speed Distance
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00

In a study from the 1920s, fifty cars were used to see how the speed of the car and the distance taken to stop were related. Speeds ranged between 4 and 25 mph. Distances taken to stop ranged between 2 and 120 feet, with the middle 50% falling between 26 and 56 feet.

You can also embed plots as normal, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Prof Correia’s resources!

Bulleted list

You can make a bulleted list like this:

  • item 1
  • item 2
  • item 3

Numbered list

You can make a numbered list like this

  1. First thing I want to say
  2. Second thing I want to say
  3. Third thing I want to say